Towards Testing the Syntax of Punctuation

نویسنده

  • Bernard E. M. Jones
چکیده

Little work has been done in NLP on the subject of punctuation, owing mainly to a lack of a good theory on which computational treatments could be based. This paper described early work in progress to try to construct such a theory. Two approaches to finding the syntactic function of punctuation marks are discussed, and procedures are described by which the results from these approaches can be tested and evaluated both against each other as well as against other work. Suggestions are made for the use of these results, and for future work. 1 B a c k g r o u n d The field of punctuation has been almost completely ignored within Natural Language Processing, with perhaps the exception of the sentence-final full-stop (period). This is because there is no coherent theory of punctuation on which a computational treatment could be based. As a result, most contemporary systems simply strip out punctuation in input text, and do not put any marks into generated texts. Intuitively, this seems very wrong, since punctuation is such an integral part of many written languages. If text in the real world (a newspaper, for example) were to appear without any punctuation marks, it would appear very stilted, ambiguous or infantile. Therefore it is likely that any computational system that ignores these extra textual cues will suffer a degradation in performance, or at the very least a great restriction in the class of linguistic data it is able to process. Several studies have already shown the potential for using punctuation within NLP. Dale (1991) has * This work was carried out under an award from the (UK) ESRC. Thanks are also due to Lex Holt, Henry Thompson, Ted Briscoe and anonymous reviewers. shown the benefits of using punctuation in the fields of discourse structure and semantics, and Jones (1994) has shown in the field of syntax that using a grammar that includes punctuation yields around two orders of magnitude fewer parses than one which does not. Further work has been carried out in this area, particularly by Briscoe and Carroll (1995), to show more accurately the contribution that usage of punctuation can make to the syntactic analysis of text. The main problem with these studies is that there is little available in terms of a theory of punctuation on which computational treatments could be based, and so they have somewhat ad hoc, idiosyncratic treatments. The only account of punctuation available is that of Nunberg (1990), which although it provides a useful basis for a theory is a little too vague to be used as the basis of any implementation. Therefore it seems necessary to develop a new theory of punctuation, that is suitable for computational implementation. Some work has already been carried out, showing the variety of punctuation marks and their orthographic interaction (Jones, 1995), but this paper describes the continuation of this research to determine the true syntactic function of punctuation marks in text. There are two possible angles to the problem of the syntactic function of punctuation: an observational one, and a theoretical one. Both approaches were adopted, in order to be be able to evaluate the results of each approach against those of the other, and in the hope that the results of both approaches could be combined. Thus the approaches are described one after the other here. 2 C o r p u s b a s e d A p p r o a c h The best data source for observation of grammatical punctuation usage is a large, parsed corpus. It ensures a wide range of real language is covered, and because of its size it should minimise the effect of any

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Punctuation, Prosody, and Discourse: Afterthought Vs. Right Dislocation

In a reading production experiment we investigate the impact of punctuation and discourse structure on the prosodic differentiation of right dislocation (RD) and afterthought (AT). Both discourse structure and punctuation are likely to affect the prosodic marking of these right-peripheral constructions, as certain prosodic markings are appropriate only in certain discourse structures, and punct...

متن کامل

Towards a Syntactic Account of Punctuation

Little notice has been taken of punctuation in the field of natural language processing, chiefly due to the lack of any coherent theory on which to base implementations. Some work has been carried out concerning punctuation and parsing, but much of it seems to have been rather ad-hoc and performance-motivated. This paper describes the first step towards the construction of a theoretically-motiv...

متن کامل

The Syntax and Semantics of Punctuation and Its Use in Interpretation

In this paper, I argue for a declarative description of the syntax and semantics of punctuation marks (in English) couched in a feature/uniication-based phrase structure formalism, describe how Nunberg's (1990) syntactic analysis of punctuation can be combined with Dale's (1991) suggested semantic analysis within this framework, and present experimental evidence that 1) the resulting text gramm...

متن کامل

Case Reports: Stories Worth Telling in Today`s Bone and Joint Literature

Case reports are considered as the lowest level of evidence while at the same time they are frontiers of evidence collection. case reports are professional stories about novel medical events and will worth publication only if they possess complexity, proper data collection, justified diagnosis, and legitimate intervention as well as appropriate language, punctuation, and syntax. A case report m...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1996